Techniques for combining human and machine learning in natural language processing

ABSTRACT

Methods, apparatuses and computer readable medium are presented for generating a natural language model. A method for generating a natural language model comprises: receiving more than one annotation of a document; calculating a level of agreement among the received annotations; determining that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement; determining an aggregated annotation representing an aggregation of information in the received annotations and training a natural language model using the aggregated annotation, when the first criterion is satisfied; generating at least one human readable prompt configured to receive additional annotations of the document, when the second criterion is satisfied; and discarding the received annotations from use in training the natural language model, when the third criterion is satisfied.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/185,843, filed Nov. 9, 2018, and titled “TECHNIQUES FOR COMBININGHUMAN AND MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING,” which is acontinuation of U.S. patent application Ser. No. 14/964,528, filed Dec.9, 2015, and titled “TECHNIQUES FOR COMBINING HUMAN AND MACHINE LEARNINGIN NATURAL LANGUAGE PROCESSING,” which claims the benefits of U.S.Provisional Application 62/089,736, filed Dec. 9, 2014, and titled,“METHODS AND SYSTEMS FOR ANNOTATING NATURAL LANGUAGE PROCESSING,” U.S.Provisional Application 62/089,742, filed Dec. 9, 2014, and titled,“METHODS AND SYSTEMS FOR IMPROVING MACHINE PERFORMANCE IN NATURALLANGUAGE PROCESSING,” U.S. Provisional Application 62/089,745, filedDec. 9, 2014, and titled, “METHODS AND SYSTEMS FOR IMPROVINGFUNCTIONALITY IN NATURAL LANGUAGE PROCESSING,” U.S. ProvisionalApplication 62/089,747, filed Dec. 9, 2014, and titled, “METHODS ANDSYSTEMS FOR SUPPORTING NATURAL LANGUAGE PROCESSING,” the disclosures ofwhich are incorporated herein in their entireties and for all purposes.

This application is also related to U.S. patent application Ser. No.14/964,517, filed Dec. 9, 2015, and titled “METHODS FOR GENERATINGNATURAL LANGUAGE PROCESSING SYSTEMS,” U.S. patent application Ser. No.14/964,518, filed Dec. 9, 2015, and titled “ARCHITECTURES FOR NATURALLANGUAGE PROCESSING,” U.S. patent application Ser. No. 14/964,520, filedDec. 9, 2015, and titled “OPTIMIZATION TECHNIQUES FOR ARTIFICIALINTELLIGENCE,” U.S. patent application Ser. No. 14/964,522, filed Dec.9, 2015, and titled “GRAPHICAL SYSTEMS AND METHODS FOR HUMAN-IN-THE-LOOPMACHINE INTELLIGENCE,” U.S. patent application Ser. No. 14/964,510,filed Dec. 9, 2015, and titled “METHODS AND SYSTEMS FOR IMPROVINGMACHINE LEARNING PERFORMANCE,” U.S. patent application Ser. No.14/964,511, filed Dec. 9, 2015, and titled “METHODS AND SYSTEMS FORMODELING COMPLEX TAXONOMIES WITH NATURAL LANGUAGE UNDERSTANDING,” U.S.patent application Ser. No. 14/964,512, filed Dec. 9, 2015, and titled“AN INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE ANDCODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING,” U.S. patent applicationSer. No. 14/964,525, filed Dec. 9, 2015, and titled “METHODS AND SYSTEMSFOR LANGUAGE-AGNOSTIC MACHINE LEARNING IN NATURAL LANGUAGE PROCESSINGUSING FEATURE EXTRACTION,” and U.S. patent application Ser. No.14/964,526, filed Dec. 9, 2015, and titled “METHODS AND SYSTEMS FORPROVIDING UNIVERSAL PORTABILITY IN MACHINE LEARNING,” each of which arefiled concurrently herewith, and the entire contents and substance ofall of which are hereby incorporated in total by reference in theirentireties and for all purposes.

TECHNICAL FIELD

The subject matter disclosed herein generally relates to processingdata. In some example embodiments, the present disclosures relate toannotation aggregation techniques for natural language model generation.

BACKGROUND

There is a need for assisting customers or users to accurately andexpediently process human communications brought upon by thecapabilities of the digital age. The modes of human communicationsbrought upon by digital technologies have created a deluge ofinformation that can be difficult for human readers to handle alone.Companies and research groups may want to determine trends in the humancommunications to determine what people generally care about for anyparticular topic, whether it be what car features are being mostexpressed on Twitter®, what political topics are being most expressed onFacebook®, what people are saying about the customer's latest product intheir customer feedback page, and so forth. It may be desirable forcompanies to aggregate and then synthesize the thousands or evenmillions of human communications from the many different modes availablein the digital age (e.g., Twitter®, blogs, email, etc.). Processing allthis information by humans alone can be overwhelming andcost-inefficient. Methods today may therefore rely on computers to applynatural language processing in order to interpret the many humancommunications available in order to analyze, group, and ultimatelycategorize the many human communications into digestible patterns ofcommunication.

While natural language processing techniques have been applied to try toprocess human communications, some methods are less efficient thanothers in that more time may be needed to develop models or techniquescatered to a client's specific needs. Conventional methodologies forproviding natural language processing to clients may take months tocomplete, for example. Therefore, it is desirable to improvemethodologies for generating natural language models.

BRIEF SUMMARY

A method for generating a natural language model comprises: receivingmore than one annotation of a document; calculating a level of agreementamong the received annotations; determining that a criterion among afirst criterion, a second criterion, and a third criterion is satisfiedbased at least in part on the level of agreement; determining anaggregated annotation representing an aggregation of information in thereceived annotations and training a natural language model using theaggregated annotation, when the first criterion is satisfied; generatingat least one human readable prompt configured to receive additionalannotations of the document, when the second criterion is satisfied; anddiscarding the received annotations from use in training the naturallanguage model, when the third criterion is satisfied.

According to some example embodiments, the second criterion is satisfiedwhen the number of annotations received is less than a minimum number.

According to some example embodiments, the annotations of the documentcomprise selection of one or more portions of the document relevant toone or more topics.

According to some example embodiments, the annotations of the documentcomprise selection of one or more categories among a plurality ofcategories.

According to some example embodiments, the level of agreement isdetermined for each category based on a percentage of annotations thatselect said category.

According to some example embodiments, the first criterion is satisfiedwhen the number of annotations received is at least a minimum number andthe level of agreement for a category is at least a threshold level; andthe aggregated annotation is determined as selecting or not selectingsaid category.

According to some example embodiments, the second criterion is satisfiedwhen the number of annotations received is less than a maximum numberand the level of agreement is less than a threshold level.

According to some example embodiments, the third criterion is satisfiedwhen the number of annotations received is at least a maximum number andthe level of agreement is less than a threshold level.

According to some example embodiments, a numerical value is assigned toeach of the plurality of categories.

According to some example embodiments, the level of agreement comprisesa difference between the highest numerical value and the lowestnumerical value among the selected categories; the first criterion issatisfied when the difference is no more than a threshold value; and thethird criterion is satisfied when the difference is more than thethreshold value.

According to some example embodiments, the aggregated annotation isdetermined as selection of a category with the numerical value closestto a mean of the numerical values of all received annotations.

According to some example embodiments, the aggregated annotation isdetermined as selection of a category with the numerical value closestto a median of the numerical values of all received annotations.

According to some example embodiments, determining that the criterionamong the first criterion, the second criterion, and the third criterionis satisfied is further based on a result of an analysis of the documentby one or more pre-existing natural language models.

According to some example embodiments, determining that the criterionamong the first criterion, the second criterion, and the third criterionis satisfied is further based on known performance levels of annotators.

According to some example embodiments, at least one of the annotationsreceived comprises prediction by a pre-existing natural language model.

An apparatus for generating a natural language model comprises one ormore processors configured to: receive more than one annotation of adocument; calculate a level of agreement among the received annotations;determine that a criterion among a first criterion, a second criterion,and a third criterion is satisfied based at least in part on the levelof agreement; determine an aggregated annotation based on the receivedannotations and train a natural language model using the aggregatedannotation, when the first criterion is satisfied; generate at least onehuman readable prompt configured to receive additional annotations ofthe document, when a second criterion is satisfied; and discard thereceived annotations from use in training the natural language model,when the third criterion is satisfied.

According to some example embodiments, the annotations of the documentcomprise selection of one or more categories among a plurality ofcategories.

According to some example embodiments, the level of agreement isdetermined for each category based on a percentage of annotations thatselect said category.

According to some example embodiments, a numerical value is assigned toeach of the plurality of categories; the level of agreement comprises adifference between the highest numerical value and the lowest numericalvalue among the selected categories; the first criterion is satisfiedwhen the difference is no more than a threshold value; and the thirdcriterion is satisfied when the difference is more than a thresholdvalue.

A non-transitory computer readable medium comprises instructions that,when executed by a processor, cause the processor to: receive more thanone annotation of a document; calculate a level of agreement among thereceived annotations; determine that a criterion among a firstcriterion, a second criterion, and a third criterion is satisfied basedat least in part on the level of agreement; determine an aggregatedannotation based on the received annotations and train a naturallanguage model using the aggregated annotation, when the first criterionis satisfied; generate at least one human readable prompt configured toreceive additional annotations of the document, when a second criterionis satisfied; and discard the received annotations from use in trainingthe natural language model, when the third criterion is satisfied.

According to some example embodiments, the annotations of the documentcomprise selection of one or more categories among a plurality ofcategories.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments are illustrated by way of example and not limitation inthe figures of the accompanying drawings.

FIG. 1 is a flow chart illustrating an example method for generating anatural language model, according to some example embodiments.

FIG. 2 is a diagram illustrating an example of a classification task,according to some example embodiments.

FIG. 3 is a diagram illustrating an example of an extraction task,according to some example embodiments.

FIG. 4 is a diagram illustrating an example of evaluating theperformance level of a trained model using cross validation, accordingto some example embodiments.

FIG. 5 is a diagram illustrating an example of topic modeling, accordingto some example embodiments.

FIG. 6 is a diagram illustrating an example of a confidence level,according to some example embodiments.

FIG. 7 is a diagram illustrating an example of a level of disagreement,according to some example embodiments.

FIG. 8 is a diagram illustrating an example of an annotator performancelevel, according to some example embodiments.

FIG. 9 is a flow chart illustrating an example method for generating anatural language model, according to some example embodiments.

FIG. 10 is a flow chart illustrating high level concepts of annotationaggregation, according to some example embodiments.

FIG. 11 is a flow chart illustrating an example method referred to as“Majority Vote,” according to some example embodiments.

FIG. 12 is a flow chart illustrating an example method referred to as“Within Tolerance,” according to some example embodiments.

FIG. 13 is a block diagram illustrating components of a machine,according to some example embodiments, able to read instructions from amachine-readable medium and perform any one or more of the methodologiesdiscussed herein.

DETAILED DESCRIPTION

Example methods, apparatuses and computer readable medium are presentedfor providing annotation aggregation techniques during the process ofgenerating a natural language model.

In natural language processing, machine learning techniques are oftenapplied. Human annotations help the machine learning techniques resolveinevitable ambiguities in the human communications, as well as provideintelligence or meaning to communications that the machine does notaccurately comprehend. The human annotations can then enable computersto provide better natural language processing results of the humancommunications.

One of the problems with existing natural language machine learningtechniques is selection of samples to be annotated by a human.Generally, when more human communications samples are annotated byhumans, the results of natural language processing tend to be moreaccurate. However, human annotation is time-consuming and expensive.Thus, there is a need for a method that selects a relatively smallnumber of samples to be annotated by humans while still effectivelyproviding annotations that sufficiently aide the machine learningtechniques. The selected samples provide substantial benefit to themachine learning process, so that the accuracy of natural languageprocessing can be improved with a small amount of human annotation.

Another problem with existing natural language machine learningtechniques is combining annotations from different human annotators.Human annotations are not perfect. Various factors, such as theexperience of human annotators and the quality of questions that thehuman annotators are asked to answer, have impact on the accuracy ofhuman annotations. Thus, there is a need to have multiple humanannotators annotate a same sample, and combine these annotations toassist the machine learning process, so that the impact of theinaccuracy of individual annotators can be reduced.

Examples merely demonstrate possible variations. Unless explicitlystated otherwise, components and functions are optional and may becombined or subdivided, and operations may vary in sequence or becombined or subdivided. In the following description, for purposes ofexplanation, numerous specific details are set forth to provide athorough understanding of example embodiments. It will be evident to oneskilled in the art, however, that the present subject matter may bepracticed without these specific details.

I. Intelligent Queuing

Aspects of the present disclosure are presented to select a relativelysmall number of samples to be annotated by human while still effectivelyproviding annotations that sufficiently aide the machine learningtechniques. The selected samples provide substantial benefit to themachine learning process, so that the accuracy of natural languageprocessing can be improved with a small amount of human annotation.These aspects of the present disclosure are sometimes referred to as“Intelligent Queuing.”

a. General Description of Intelligent Queuing

FIGS. 1-4 provide a high level description of intelligent queuing,including example interfaces for interacting with a user in theintelligent queuing process, according to some embodiments.

Referring to FIG. 1, a flow chart illustrating an example method 100 forgenerating a natural language model is shown, according to some exampleembodiments. The example method 100 begins at step 110 with selecting aseed set of documents to be annotated. Step 110 may be conducted by oneor more processors in a natural language platform. The seed set ofdocuments is selected from a pool of documents to be processed bynatural language processing. A document is a single unit of text to beprocessed by natural language processing. A document, for example, maybe a Tweet, a Facebook® status update, a blog, an e-mail, an onlinecomment, a transcribed phone conversation, etc. According to someexample embodiments, the seed set of documents may contain one documentor more than one document.

The example method 100 then continues at step 120 to receivingannotations of the seed set of documents. Step 120 may be facilitated bygenerating for each document in the seed set of documents, a first humanreadable prompt configured to elicit an annotation of said document, andreceiving annotations of the seed set of documents elicited by the firsthuman readable prompts. The human readable prompts may be generated bythe one or more processors in the natural language platform. The humanreadable prompts may be, for example, in the form of graphic userinterface (“GUI”). The human readable prompts may be generated locallyor remotely.

According to some example embodiments, the annotations of the seed setof documents may comprise classification of the documents into one ormore categories among a plurality of categories. This type ofannotations is sometimes referred to as a “classification task” or a“document-scope task.” The categories are sometimes referred to as“labels.” An example of a classification task is sentiment analysis,where the annotations classify documents into sentiments such as thelabels “positive” and “negative.” The sentiments may also include otherlabels, such as “very positive,” “very negative,” “neutral,” etc.Another example of a classification task is genre analysis, where theannotations classify documents into genres such as “mystery,”“suspense,” “romance,” “science fiction,” and “fantasy.” According tosome example embodiments, the categories may be arranged in a hierarchyreferred to as an “ontology.”

According to some example embodiments, the annotations of the seed setof documents may comprise selection of one or more portions of thedocuments relevant to one or more topics. This type of annotation issometimes referred to as an “extraction task” or a “span-scope task.”The one or more topics are sometimes referred to as “labels.” An exampleof an extraction task is named entity recognition, where the annotationsidentify portions of documents relevant to a person, a company and/or alocation. Another example of an extraction task is part-of-speechtagging, where the annotations identify portions of documents that arenoun phrases and/or verb phrases.

According to some example embodiments, the annotations may be manualannotations supplied by human analysts and/or crowd-sourced workers. Theannotations may also be supplied by machines.

The example method 100 then continues at step 130 to training a naturallanguage model using the annotated documents. Step 130 may be conductedby the one or more processors in the natural language platform. WhenStep 130 is performed for the first time, the annotated seed set ofdocuments is used to train the natural language model. Note that only asubset of all available annotated documents may be used. A naturallanguage model is an artificial intelligence (“AI”) model that is usedto conduct natural language processing on the documents. Training is themachine learning process used to generate and/or improve a naturallanguage model. More details about training are described innon-provisional application U.S. patent application Ser. No. 14/964,517,filed Dec. 9, 2015, which is again incorporated by reference in itsentirety.

The example method 100 then determines at step 140 whether the trainedmodel has reached a predetermined performance level. Step 140 may beconducted by the one or more processors in the natural languageplatform. According to some example embodiments, the performance levelof the trained model may be evaluated using a cross validationtechnique, a comparison to a held-out test set, or other techniquesgenerally known in the art.

If the trained model has reached a predetermined performance level, theexample method 100 ends. If the trained model has not reached apredetermined performance level, then additional optimization techniquesmay be performed to further refine the natural language model. Here, theexample method 100 continues at step 145 to determining documents in thepool having uncertain natural language processing results. Step 145 maybe conducted by the one or more processors in the natural languageplatform. The uncertainty of natural language processing results may bebased on the trained natural language model and/or the receivedannotations. The degree of uncertainty may be measured by a number ofdifferent approaches, as described below in the detailed description ofstep 145. The documents thus determined are documents informative to themachine learning process.

The example method 100 continues from step 145 to step 150: selecting anew set of documents to be annotated. The new set of documents comprisesdocuments having uncertain natural language processing results, ordocuments that, when annotated, are likely to reinforce or improve theaccuracy of the natural language model. Step 150 may be conducted by theone or more processors in the natural language platform. This process issometimes referred to as “queuing.” According to some exampleembodiments, the new set of documents may contain one document or morethan one document.

According to some example embodiments, selecting the new set ofdocuments comprises selecting documents similar to documents that havealready been selected in the second set of documents. This way, themachine learning process is reinforced by annotating even more documentshaving uncertain natural language processing results.

The example method 100 continues from step 150 to step 160: receivingannotations of the new set of documents. Step 160 may be facilitated bygenerating for each document in the new set of documents, a second humanreadable prompt configured to elicit an annotation of said document, andreceiving annotations of the new set of documents elicited by the secondhuman readable prompts. The human readable prompts may be generated bythe one or more processors in the natural language platform. The humanreadable prompts may be, for example, displayable in a GUI. The humanreadable prompts may be generated locally or remotely.

According to some example embodiments, the second human readable promptmay be configured to elicit a true-or-false answer aimed at resolvinguncertainty in the natural language processing results. For example, atrue-or-false question may be asked when an annotator has alreadyannotated a document, and the aggregation process (described in SectionII) determines that independent verification of the annotator'sselection is appropriate, because the annotator has low performancelevel (discussed below); when multiple annotators provide conflictingannotations for the document; when the natural language modelconfidently disagrees with an annotator; or when the natural languagemodel has a low-or-medium level confidence about a prediction which hasnot been annotated previously. In extraction tasks, the second humanreadable prompt may be configured to elicit a true-or-false answer forpartially-agreed-upon annotations (i.e., when two annotators selectoverlapping but unequal locations in the text).

According to some example embodiments, the annotations of the new set ofdocuments may comprise classification of the documents into one or morecategories among a plurality of categories. According to some exampleembodiments, the annotations of the new set of documents may compriseselection of one or more portions of the documents relevant to one ormore topics.

The example method 100 continues from step 160 back to step 130:training the natural language model using the annotated documents.According to some example embodiments, the natural language model may betrained using all annotated documents that have been accumulated duringthe process, or any subset thereof. For example, the natural languagemodel may be trained using only the annotated new set of documents. Theiterative process continues until the model has reached thepredetermined performance level.

According to some example embodiments, Step 140 may be omitted. In sucha case, steps 145, 150 and 160 may be repeated until all the documentsin the pool have been annotated, or may be repeated for a predeterminednumber of times. The predetermined number of times may be 1.

Referring to FIG. 2, a diagram illustrating an example 200 of aclassification task is shown, according to some example embodiments. Inthe example 200, the annotator is annotating an online comment shown inbox 210. The annotator classifies the online comment into one or more offour categories (labels). Here the categories are genres “Action” 220,“Comedy” 230, “Horror” 240 and “Mystery” 250.

Referring to FIG. 3, a diagram illustrating an example 300 of anextraction task is shown, according to some example embodiments. In theexample 300, the annotator selects one or more portions of the text inbox 310 and marks the portions as one or more of four topics (labels)320, 330, 340 and 350.

Referring to FIG. 4, a diagram illustrating an example 400 of evaluatingthe performance level of a trained model using cross validation isshown, according to some example embodiments. Cross validation is atechnique used to measure the performance of a predictive model. Forexample, in a 10-fold cross validation, all the annotated documents aredivided into 10 subsets with equal size. Each time, 9 of the subsets areused to train a “test” natural language model, the trained “test” modelconducts an analysis of the documents in the remaining 1 subset, and theresult of the analysis is compared to the annotations of the remaining 1subset. The process is repeated until every combination of 9 subsets hasbeen used to train a “test” model. The results of the comparisons arerepresented by several parameters: accuracy, precision, recall, andF-score. The higher these parameters are, the better the performance ofthe original model is, since the “test” models were generated using thesame machine learning process and with a subset of the same annotateddocuments. Thus, the predetermined performance level may be defined asthreshold value(s) of one or more of these parameters. As shown inexample 400, as the number of annotated documents increases (because ofthe iterative process in the example method 100), the performance levelincreases.

b. Further Example Details for Selection of the Seed Set of Documents

Now, step 110 in the example method 100 of selecting a seed set ofdocuments to be annotated is described with further details below. Theseed set of documents is usually selected when there is no trained modelavailable. To speed up the machine learning process, the seed set ofdocuments is selected with the goal of exposing the machine learningprocess to as many different types of documents as possible. In otherwords, the diversity of the documents should be reflected in the seedset of documents. Thus, according to some example embodiments, the seedset of documents may be selected such that they are evenly distributedamong different document types. According to some example embodiments,exact duplicates and/or near duplicates may be removed from the seed setof documents.

According to some example embodiments, selecting the seed set ofdocuments may be unsupervised. Unsupervised machine learning techniquessuch as topic modeling may be used. Topic modeling is used tomachine-discover topics within a pool of documents.

Referring to FIG. 5, a diagram illustrating an example 500 of topicmodeling is shown, according to some example embodiments. Topic modelingdiscovers a number of common themes (topics) represented by a list ofkey words, and groups documents into these topics. As shown in example500, three topics 510, 520 and 530 have been discovered, each with alist of key words and a list of documents that can be viewed. Moredetails about topic modeling are described in non-provisionalapplication U.S. patent application Ser. No. 14/964,517, filed Dec. 9,2015, which is again incorporated by reference in its entirety.

According to some example embodiments, the seed set of documents may beselected such that at least one document is selected within eachmachine-discovered topic. Thus the machine learning process is exposedto all of the topics discovered by topic modeling. According to someexample embodiments, selecting at least one document within eachmachine-discovered topic may be based on well-known sampling techniquessuch as stratified sampling.

According to some example embodiments, the topic modeling algorithm maygenerate hierarchical relationships among the topics (referred to ashierarchical clustering).

According to some example embodiments, the seed set of documents may beselected based on a keyword search. For example, a first group of searchresults is generated by searching keywords related to a first documenttype, and a second group of search results is generated by searchingkeywords related to a second document type. At least one document withinthe first group and at least one document within the second group areselected. According to some example embodiments, the search may beconducted on metadata of the documents.

According to some example embodiments, the seed set of documents may beselected based on confidence levels generated by analysis of thedocuments by one or more existing natural language models. The one ormore existing natural language models may be, for example, off-the-shelfnatural language models such as Idibon® public models that can be usedfor sentiment analysis on English language documents.

Referring to FIG. 6, a diagram illustrating an example 600 of aconfidence level is shown, according to some example embodiments. Asshown in example 600, text of document 610 “Idibon's NLP is great!!!” isanalyzed by a natural language model. The natural language model outputsa confidence level for each of the possible categories (labels)“positive” 620, “neutral” 630, and “negative” 640. A confidence level isa measure of how confidently the model classifies the document into acategory (label). As shown in example 600, “positive” 620 has thehighest confidence level 0.77.

A document may be selected for the seed set of documents when, forexample, none of the confidence levels for the categories (labels) arehigh enough. Example confidence levels would be as follows: “positive”:0.6, “neutral”: 0.4, and “negative”: 0.3. A document may be excludedfrom the seed set of documents when, for example, the confidence levelfor one category (label) is sufficiently high. Example confidence levelswould be as follows: “positive”: 0.8, “neutral”: 0.2, and “negative”:0.1. This way, the documents having the most uncertain natural languageprocessing results according to the existing models (cause the mostconfusion) are selected for human annotation.

According to some example embodiments, the seed set of documents may beselected manually. Experts or project managers that have expertise inprocessing a specific type of documents may conduct the initialselection manually.

According to some example embodiments, the seed set of documents may beselected based on random sampling. The size of the seed set may varydepending on the desired confidence level and confidence interval.

According to some example embodiments, the seed set of documents may beselected based on stratified random sampling. The strata may bedetermined based on a variety of factors including intervals/categoriesin metadata fields (e.g. time of post, author), or document clusters asdetermined by metadata similarity, text similarity, document lengths,etc.

According to some example embodiments, the seed set of documents may beselected based on document ordering, e.g., the first N documents may beselected.

c. Further Example Details for Selection of the New Set of Documents

Now, step 145 in the example method 100 of determining documents havinguncertain natural language processing results is described with furtherdetails below.

According to some example embodiments, determining documents havinguncertain natural language processing results may be based on aconfidence level generated by analysis of the documents by the trainedmodel. In a way similar to the above description with reference to FIG.6, the trained model analyzes the documents in the pool and outputsconfidence levels for the documents. A document may be determined ashaving uncertain natural language processing results when, for example,none of the confidence levels for the categories (labels) are highenough. A document may be determined as not having uncertain naturallanguage processing results when, for example, the confidence level forone category (label) is sufficiently high. This way, the documentshaving the most uncertain natural language processing results accordingto the trained model (cause the most confusion) are selected for a newround of human annotation.

According to some example embodiments, determining documents havinguncertain natural language processing results may be based on a level ofdisagreement among more than one annotator. More than one annotator mayprovide annotations for the same document, and their annotations mightdisagree with each other.

Referring to FIG. 7, a diagram illustrating an example 700 of a level ofdisagreement is shown, according to some example embodiments. As shownin example 700, after many (e.g., hundreds) of documents are annotated,a disagreement level (in other words, agreement level) is generated foreach category (label). For each category (label), documents with themost agreement and with the least agreement are identified. In example700, documents with the most agreement and with the least agreement forthe category (label) “Other” are shown. The agreement level for adocument with respect to a label is determined based on the equation:abs(2*[# annotators that select the label]/[# annotators]−1). Thus,documents with the smallest absolute value have the least agreement anddocuments with the largest absolute value have the most agreement. Notethat large negative values represent strong agreement that the documentis not the label. For example, if 3 out of 4 annotators select a labelfor that document, the agreement level will be 0.5.

Therefore, for each category (label), a number of documents with theleast agreement may be determined as having uncertain natural languageprocessing results. According to some example embodiments, the documentshaving the most uncertain natural language processing results accordingto the received annotations (cause the most confusion) can be used toidentify similar documents that are also likely to have uncertainnatural language processing results. These similar documents can beselected for a new round of human annotation. Document similarity can becalculated automatically based on text or metadata similarity. In otherwords, selecting the second set of documents comprises selectingdocuments similar to documents that have a high level of disagreementamong more than one annotator

According to some example embodiments, the level of disagreement isdetermined by assigning more weight to annotators with better knownperformance levels. A performance level for each annotator may begenerated based on past experience with this annotator.

Referring to FIG. 8, a diagram illustrating an example 800 of anannotator performance level is shown, according to some exampleembodiments. As shown in example 800, the performance levels of fourannotators 810, 820, 830 and 840 are evaluated based on their agreementwith other annotators (referred to as Inter-Annotator Agreement or IAA)on the right column. The IAA scores may be calculated based on theKrippendorff's alpha method generally known in the art.

According to some example embodiments, one or more additional naturallanguage models may be trained at step 130, and determining documentshaving uncertain natural language processing results may be based on alevel of disagreement among multiple trained models. Multiple naturallanguage models may be trained, for example, by training using differentsubsets of the annotated documents. Multiple natural language models mayalso be trained, for example, by applying the same annotated set ofdocuments to different base models.

The level of disagreement may be determined in a way similar to thedescription with reference to FIG. 7. For each category (label), anumber of documents with the least agreement may be determined as havinguncertain natural language processing results. This way, the documentshaving the most uncertain natural language processing results (cause themost confusion) are selected for a new round of human annotation.

According to some example embodiments, the level of disagreement isdetermined by assigning more weight to models with better knownperformance levels. The performance level of a model may be determinedin a way similar to the description with reference to FIG. 4.

According to some example embodiments, the seed set or the new set ofdocuments may be selected based on TF-IDF. TF-IDF is generally known inthe art as a way of measuring how important individual words are to apool of documents. After the performance of feature extraction andfeature selection (described in more details in non-provisionalapplication U.S. patent application Ser. No. 14/964,525, filed Dec. 9,2015, which is again incorporated by reference in its entirety),statistics are generated across the entire document pool, including thefrequencies of occurrence of each feature within the document pool asmeasured by TF-IDF. Such statistics may be stored in a table associatingeach feature with the calculated statistics. In intelligent queuing,these statistics may be used to select one or more documents from thedocument pool for human annotation. For example, intelligent queuing mayselect documents containing rare features (contained in few otherdocuments or contained only in documents that have not been annotated),to improve the natural language model's understanding of such features.

According to some example embodiments, the new set of documents may beselected based on a confusion matrix. For example, documents similar tothose documents that were determined to be misclassified by the naturallanguage model may be selected for queuing.

An apparatus for generating a natural language model may comprise one ormore processors configured to perform the steps described above.

A non-transitory computer readable medium may comprise instructionsthat, when executed by a processor, cause the processor to perform thesteps described above.

II. Annotation Aggregation

Aspects of the present disclosure are presented to have multiple humanannotators annotate a same sample, and combine these annotations toassist the machine learning process, so that the impact of theinaccuracy of individual annotators can be reduced. In addition,combining the annotations may determine whether there is sufficientagreement about the document for it to be used in the machine learningprocess for training the natural language model. These aspects of thepresent disclosure are sometimes referred to as “AnnotationAggregation.”

Referring to FIG. 9, a flow chart illustrating an example method 900 forgenerating a natural language model is shown, according to some exampleembodiments. The example method 900 begins with receiving, at step 910,more than one annotation of a document. The concepts of “annotations”and “document” are similar to those described with respect toIntelligent Queuing in Section I. The annotations of the document maycomprise selection of one or more portions of the document relevant toone or more topics (extraction tasks), or comprise selection of one ormore categories among a plurality of categories (classification tasks).The more than one annotation may be provided by multiple annotators, ormay be provided by a single annotator at different times.

The example method 900 then continues to calculating, at step 920, alevel of agreement among the received annotations. The more than oneannotation of a same document might disagree with each other. The levelof agreement measures the degree of agreement or disagreement amongthese annotations.

The example method 900 then continues at step 930 to determining whetherone of Criterion 1, Criterion 2 and Criterion 3 is satisfied. Thedetermination may be based at least in part on the level of agreement.

If Criterion 1 is satisfied, the example method 900 continues todetermining, at step 940, an aggregated annotation representing anaggregation of information in the received annotations, and then totraining 950 a natural language model using the aggregated annotation.The concepts of “training” and “natural language model” are similar tothose described with respect to Intelligent Queuing in Section I.

If Criterion 2 is satisfied, the example method 900 continues togenerating, at step 960, at least one human readable prompt configuredto receive additional annotations of the document. The human readableprompts may be, for example, in a form displayable in a GUI. The humanreadable prompts may be generated locally or remotely. According to someexample embodiments, Criterion 2 is satisfied when the number ofannotations received is less than a minimum number. Thus, aggregatedannotation is not generated until a minimum number of annotations arereceived.

If Criterion 3 is satisfied, the example method 900 continues todiscarding, at step 970, the received annotations. The discardedannotations will not be used for training natural language models.Criterion 3 may be satisfied when it is determined that there is toomuch disagreement among the annotations about the document, and thateither additional annotations will not resolve the disagreement or it isnot worth the time spent trying to resolve the disagreement.

Referring to FIG. 10, a flow chart illustrating high level concepts ofannotation aggregation is shown, according to some example embodiments.At 1020, a document is read from document database 1010. Humanannotations 1030 of the document are received and supplied toaggregation process 1070.

Optionally, model prediction 1040, inter-label relationships 1050 andper-user inter-annotator agreement (“IAA”) scores 1060 may also besupplied to aggregation process 1070.

Model prediction 1040 is a result of an analysis of the document by oneor more existing natural language models. The one or more existingnatural language models may be, for example, off-the-shelf naturallanguage models such as Idibon® public models that can be used forsentiment analysis on English language. The result of the analysis, forexample, may be confident levels described above with reference to FIG.6. An example use of model prediction 1040 in aggregation process 1070is to treat a model prediction as another “human annotation,” or a“human annotation” with more weight or less weight. In other words, atleast one of the annotations received may comprise prediction by apre-existing natural language model.

Inter-label relationships 1050 are the relationships among a pluralityof categories (labels). The relationships, for example, may be therelative ordering of the categories, e.g., “positive”: 1st, “neutral”:2nd, and “negative”: 3rd. The relationship may be represented by anumerical value assigned to each of the categories, e.g., “verypositive”: +3, “positive”: +1 “neutral”: 0, “negative”: −1, and “verynegative”: −3.

Per-user IAA scores 1060 are measurements of known performance levels ofthe annotators. Per-user IAA scores 1060, for example, may be the IAAscores described above with reference to FIG. 8. An example use of IAAscores in aggregation process 1070 is to assign more weight toannotators with higher IAA scores. Alternatively, aggregation process1070 may also receive an indication that an annotator is an expert in aparticular area, whose annotation may be treated as conclusive.

In other words, aggregation process 1070 may be optionally based on therelationships among a plurality of categories, a result of an analysisof the document by one or more existing natural language models, and/orknown performance levels of the annotators.

Aggregation process 1070 may output aggregated annotation 1080 which isthen used for model training 1085, and may output candidate aggregations1090 for queuing 1095, i.e., additional annotations.

Referring to FIG. 11, a flow chart illustrating an example method 1100referred to as a “Majority Vote” is shown, according to some exampleembodiments. The example method 1100 begins by supplying user-labelmatrix 1110 for processing. The user label matrix contains annotationsby multiple annotators. An example of user-label matrix is shown below:

Label User Sports Entertainment Politics Bob Y N N Eve N Y N Alice Y N N

In the example above, for a particular document, annotators (users)“Bob” and “Alice” select category (label) “Sports,” while annotator“Eve” selects category “Entertainment.” No one selects category“Politics.”

The example method 1100 continues to process, at step 1120, eachcategory (label). According to some example embodiments, the level ofagreement may be determined for each category based on a percentage ofannotations that select said category.

For example, the level of agreement for a category may be determined asthe percentage of annotations that select said category, or thepercentage of annotations that do not select said category, whichever isgreater. The agreement levels for the above example are shown below, thelast line being the agreement levels:

Label User Sports Entertainment Politics Bob Y N N Eve N Y N Alice Y N NAggregated Y N N (Agreement) (0.6667) (0.6667) (1.0)

In another example, the level of agreement for a category may bedetermined based on the following formula:

abs(2*(#Y/(#Y+#N))−1)

where “# Y” is the number of annotators that select the category, and “#N” is the number of annotators that do not select the category. Based onthe formula, the agreement levels in the above example are 0.3333 for“Sports,” 0.3333 for “Entertainment,” and 1 for “Politics.”

The example method 1100 continues to determine, at step 1130, whetherthe number of annotators (users) is at least a minimum number, i.e.,whether the number of annotations received is at least a minimum number.If so, the example method 1100 continues to step 1140. Otherwise theexample method 1100 continues to step 1160.

At step 1140, the example method 1100 determines whether the agreementlevel is at least a threshold value. If so, the example method 1100continues to step 1150 (Criterion 1 for this example is satisfied).Otherwise the example method continues to step 1160. Note that inexample method 1100, the determination at step 1140 is conducted foreach category (label) separately. In the example where the agreementlevel is the percentage of annotations that select said category, or thepercentage of annotations that do not select said category, whichever isgreater, the threshold value may be 0.5, so that the determination atstep 1140 is essentially a majority vote. The threshold value may beabove 0.5, so that a super-majority vote is necessary.

At step 1150, the example method 1100 determines an aggregatedannotation to be used in training a natural language model. Theaggregated annotation may be selecting the category, when the percentageof annotations that select the category is greater, or not selecting thecategory, when the percentage of annotations that do not select thecategory is greater.

At step 1160, the example method 1100 determines whether the number ofannotators (users) is less than a maximum number. If so, the examplemethod 1100 continues to step 1170 (Criterion 2 for this example issatisfied). Otherwise the example method continues to step 1180(Criterion 3 for this example is satisfied).

At step 1170, the example method 1100 queues the document for additionalannotation. At least one human readable prompt may be generated toreceive additional annotations.

At step 1180, the label is ignored. In other words, the annotations forthe label are discarded and are not used for model training.

After step 1150, 1170 or 1180, the example method 1100 continues back tostep 1120 to process the next label.

When the above example user-label matrix is processed by the examplemethod 1100, a result like the example result below is shown for threecombinations of minimum number annotators, maximum number of annotators,and threshold agreement level:

minimum: 2 minimum: 2 minimum: 2 maximum: 3 maximum: 3 maximum: 3threshold: threshold: threshold: Label Assignment Agreement 0.6 0.750.75 Sports Y 0.6667 TRAIN IGNORE QUEUE Entertainment N 0.6667 TRAINIGNORE QUEUE Politics N 1.0 TRAIN TRAIN TRAIN

As shown in the table, each combination (columns 4, 5 and 6) defines aset of Criteria 1, 2 and 3. Take column 4 as an example and use N and Ato denote the number of annotations and the agreement level,respectively. Criterion 1 is N≥2 & A≥0.6. Criterion 2 is N<2, or 2≤N<3 &A<0.6, and Criterion 3 is N≥3 & A<0.6. When Criterion 1 is satisfied,the result is “TRAIN,” i.e., an aggregated annotation is generated. WhenCriterion 2 is satisfied, the result is “QUEUE,” i.e., additionalannotations are requested. When Criterion 3 is satisfied, the result is“IGNORE,” i.e., the annotations are discarded.

When the minimum number of annotators is 2, the maximum number ofannotators is 3, and the threshold agreement level is 0.6, an aggregatedannotation is generated for each of the three labels (“TRAIN”).

When the minimum number of annotators is 2, the maximum number ofannotators is 3, and the threshold agreement level is 0.75, annotationsfor “Sports” and “Entertainment” are discarded (“IGNORE”), and anaggregated annotation is generated for “Politics” (“TRAIN”).

When the minimum number of annotators is 2, the maximum number ofannotators is 4, and the threshold agreement level is 0.75, additionalannotations for “Sports” and “Entertainment” are requested (“QUEUE”),and an aggregated annotation is generated for “Politics” (“TRAIN”).

Referring to FIG. 12, a flow chart illustrating an example method 1200referred to as “Within Tolerance” is shown, according to some exampleembodiments.

As discussed above with reference to FIG. 10, inter-label relationshipsmay be used in the aggregation processing. The relationships may berepresented by a numerical value assigned to each of the categories,e.g., “very positive”: +3, “positive”: +1 “neutral”: 0, “negative”: −1,and “very negative”: −3.

The example method 1200 begins by supplying user-label matrix 1210 forprocessing. The example method 1200 continues to determine 1220 whetherthe number of annotators (number of annotations received) is correct,e.g., at least a minimum number and no more than a maximum number. Ifso, the example method 1200 continues to step 1240. Otherwise theexample method 1200 continues to step 1260 (Criterion 2 for this exampleis satisfied).

The example method 1200 receives at step 1240 a label location table1230. The label location table 1230 may be a table of the numericalvalues assigned to the labels.

At step 1240, the example method 1200 determines whether the locationsof all selected label (labels marked as “Y”) are within tolerance. Inother words, the level of agreement here is the difference between thehighest numerical value and the lowest numerical value among theselected categories. That difference is compared to a tolerancethreshold. If the locations are within tolerance (difference no greaterthan threshold), the example method 1200 continues to step 1250(Criterion 1 for this example is satisfied). If the locations areoutside tolerance (difference greater than threshold), the examplemethod 1200 continues to step 1270 (Criterion 3 for this example issatisfied).

For example, when the received annotations are A: very positive (+3), B:neutral (0), and C: negative (−1), and the tolerance threshold is 3, thelocations are outside tolerance.

At step 1250, the example method 1200 determines an aggregatedannotation to be used in training a natural language model. According tosome example embodiments, the aggregated annotation may be determined asselection of the category with the numerical value closest to a mean ofthe numerical values of all received annotations. For example, when thereceived annotations are A: very positive (+3), B: neutral (0), and C:negative (−1), the aggregated annotation may be positive (+1), which isclosest to the mean (+0.6667).

According to some example embodiments, the aggregated annotation may bedetermined as selection of the category with the numerical value closestto a median of the numerical values of all received annotations. Forexample, when the received annotations are A: very positive (+3), B:neutral (0), and C: negative (−1), the aggregated annotation may beneutral (0), which is closest to the median (0).

At step 1260, the example method 1200 queues the document for additionalannotation. At least one human readable prompt may be generated toreceive additional annotations.

At step 1270, the document is ignored. In other words, the annotationsfor the document are discarded and are not used for model training.

Although annotation aggregation is described with reference to FIG. 11and FIG. 12 for classification tasks, similar algorithms may also applyto extraction tasks. In extraction tasks, in addition to the level ofagreement with respect to what category or categories to select, thelevel of agreement with respect to locations within the document is alsoconsidered in the aggregation process.

An apparatus for generating a natural language model may comprise one ormore processors configured to perform the steps described above.

A non-transitory computer readable medium may comprise instructionsthat, when executed by a processor, cause the processor to perform thesteps described above.

Referring to FIG. 13, the block diagram illustrates components of amachine 1300, according to some example embodiments, able to readinstructions 1324 from a machine-readable medium 1322 (e.g., anon-transitory machine-readable medium, a machine-readable storagemedium, a computer-readable storage medium, or any suitable combinationthereof) and perform any one or more of the methodologies discussedherein, in whole or in part. Specifically, FIG. 13 shows the machine1300 in the example form of a computer system (e.g., a computer) withinwhich the instructions 1324 (e.g., software, a program, an application,an applet, an app, or other executable code) for causing the machine1300 to perform any one or more of the methodologies discussed hereinmay be executed, in whole or in part.

In alternative embodiments, the machine 1300 operates as a standalonedevice or may be connected (e.g., networked) to other machines. In anetworked deployment, the machine 1300 may operate in the capacity of aserver machine 110 or a client machine in a server-client networkenvironment, or as a peer machine in a distributed (e.g., peer-to-peer)network environment. The machine 1300 may include hardware, software, orcombinations thereof, and may, as example, be a server computer, aclient computer, a personal computer (PC), a tablet computer, a laptopcomputer, a netbook, a cellular telephone, a smartphone, a set-top box(STB), a personal digital assistant (PDA), a web appliance, a networkrouter, a network switch, a network bridge, or any machine capable ofexecuting the instructions 1324, sequentially or otherwise, that specifyactions to be taken by that machine. Further, while only a singlemachine 1300 is illustrated, the term “machine” shall also be taken toinclude any collection of machines that individually or jointly executethe instructions 1324 to perform all or part of any one or more of themethodologies discussed herein.

The machine 1300 includes a processor 1302 (e.g., a central processingunit (CPU), a graphics processing unit (GPU), a digital signal processor(DSP), an application specific integrated circuit (ASIC), aradio-frequency integrated circuit (RFIC), or any suitable combinationthereof), a main memory 1304, and a static memory 1306, which areconfigured to communicate with each other via a bus 1308. The processor1302 may contain microcircuits that are configurable, temporarily orpermanently, by some or all of the instructions 1324 such that theprocessor 1302 is configurable to perform any one or more of themethodologies described herein, in whole or in part. For example, a setof one or more microcircuits of the processor 1302 may be configurableto execute one or more modules (e.g., software modules) describedherein.

The machine 1300 may further include a video display 1310 (e.g., aplasma display panel (PDP), a light emitting diode (LED) display, aliquid crystal display (LCD), a projector, a cathode ray tube (CRT), orany other display capable of displaying graphics or video). The machine1300 may also include an alphanumeric input device 1312 (e.g., akeyboard or keypad), a cursor control device 1314 (e.g., a mouse, atouchpad, a trackball, a joystick, a motion sensor, an eye trackingdevice, or other pointing instrument), a storage unit 1316, a signalgeneration device 1318 (e.g., a sound card, an amplifier, a speaker, aheadphone jack, or any suitable combination thereof), and a networkinterface device 1320.

The storage unit 1316 includes the machine-readable medium 1322 (e.g., atangible and non-transitory machine-readable storage medium) on whichare stored the instructions 1324 embodying any one or more of themethodologies or functions described herein, including, for example, anyof the descriptions of FIGS. 1-12. The instructions 1324 may alsoreside, completely or at least partially, within the main memory 1304,within the processor 1302 (e.g., within the processor's cache memory),or both, before or during execution thereof by the machine 1300. Theinstructions 1324 may also reside in the static memory 1306.

Accordingly, the main memory 1304 and the processor 1302 may beconsidered machine-readable media 1322 (e.g., tangible andnon-transitory machine-readable media). The instructions 1324 may betransmitted or received over a network 1326 via the network interfacedevice 1320. For example, the network interface device 1320 maycommunicate the instructions 1324 using any one or more transferprotocols (e.g., HTTP). The machine 1300 may also represent examplemeans for performing any of the functions described herein, includingthe processes described in FIGS. 1-12.

In some example embodiments, the machine 1300 may be a portablecomputing device, such as a smart phone or tablet computer, and have oneor more additional input components (e.g., sensors or gauges) (notshown). Examples of such input components include an image inputcomponent (e.g., one or more cameras), an audio input component (e.g., amicrophone), a direction input component (e.g., a compass), a locationinput component (e.g., a GPS receiver), an orientation component (e.g.,a gyroscope), a motion detection component (e.g., one or moreaccelerometers), an altitude detection component (e.g., an altimeter),and a gas detection component (e.g., a gas sensor). Inputs harvested byany one or more of these input components may be accessible andavailable for use by any of the modules described herein.

As used herein, the term “memory” refers to a machine-readable medium1322 able to store data temporarily or permanently and may be taken toinclude, but not be limited to, random-access memory (RAM), read-onlymemory (ROM), buffer memory, flash memory, and cache memory. While themachine-readable medium 1322 is shown in an example embodiment to be asingle medium, the term “machine-readable medium” should be taken toinclude a single medium or multiple media (e.g., a centralized ordistributed database, or associated caches and servers) able to storeinstructions 1324. The term “machine-readable medium” shall also betaken to include any medium, or combination of multiple media, that iscapable of storing the instructions 1324 for execution by the machine1300, such that the instructions 1324, when executed by one or moreprocessors of the machine 1300 (e.g., processor 1302), cause the machine1300 to perform any one or more of the methodologies described herein,in whole or in part. Accordingly, a “machine-readable medium” refers toa single storage apparatus or device, as well as cloud-based storagesystems or storage networks that include multiple storage apparatus ordevices. The term “machine-readable medium” shall accordingly be takento include, but not be limited to, one or more tangible (e.g.,non-transitory) data repositories in the form of a solid-state memory,an optical medium, a magnetic medium, or any suitable combinationthereof.

Furthermore, the machine-readable medium 1322 is non-transitory in thatit does not embody a propagating signal. However, labeling the tangiblemachine-readable medium 1322 as “non-transitory” should not be construedto mean that the medium is incapable of movement; the medium should beconsidered as being transportable from one physical location to another.Additionally, since the machine-readable medium 1322 is tangible, themedium may be considered to be a machine-readable device.

Throughout this specification, plural instances may implementcomponents, operations, or structures described as a single instance.Although individual operations of one or more methods are illustratedand described as separate operations, one or more of the individualoperations may be performed concurrently, and nothing requires that theoperations be performed in the order illustrated. Structures andfunctionality presented as separate components in example configurationsmay be implemented as a combined structure or component. Similarly,structures and functionality presented as a single component may beimplemented as separate components. These and other variations,modifications, additions, and improvements fall within the scope of thesubject matter herein.

Certain embodiments are described herein as including logic or a numberof components, modules, or mechanisms. Modules may constitute softwaremodules (e.g., code stored or otherwise embodied on a machine-readablemedium 1322 or in a transmission medium), hardware modules, or anysuitable combination thereof. A “hardware module” is a tangible (e.g.,non-transitory) unit capable of performing certain operations and may beconfigured or arranged in a certain physical manner. In various exampleembodiments, one or more computer systems (e.g., a standalone computersystem, a client computer system, or a server computer system) or one ormore hardware modules of a computer system (e.g., a processor 1302 or agroup of processors 1302) may be configured by software (e.g., anapplication or application portion) as a hardware module that operatesto perform certain operations as described herein.

In some embodiments, a hardware module may be implemented mechanically,electronically, or any suitable combination thereof. For example, ahardware module may include dedicated circuitry or logic that ispermanently configured to perform certain operations. For example, ahardware module may be a special-purpose processor, such as a fieldprogrammable gate array (FPGA) or an ASIC. A hardware module may alsoinclude programmable logic or circuitry that is temporarily configuredby software to perform certain operations. For example, a hardwaremodule may include software encompassed within a general-purposeprocessor 1302 or other programmable processor 1302. It will beappreciated that the decision to implement a hardware modulemechanically, in dedicated and permanently configured circuitry, or intemporarily configured circuitry (e.g., configured by software) may bedriven by cost and time considerations.

Hardware modules can provide information to, and receive informationfrom, other hardware modules. Accordingly, the described hardwaremodules may be regarded as being communicatively coupled. Where multiplehardware modules exist contemporaneously, communications may be achievedthrough signal transmission (e.g., over appropriate circuits and buses1308) between or among two or more of the hardware modules. Inembodiments in which multiple hardware modules are configured orinstantiated at different times, communications between such hardwaremodules may be achieved, for example, through the storage and retrievalof information in memory structures to which the multiple hardwaremodules have access. For example, one hardware module may perform anoperation and store the output of that operation in a memory device towhich it is communicatively coupled. A further hardware module may then,at a later time, access the memory device to retrieve and process thestored output. Hardware modules may also initiate communications withinput or output devices, and can operate on a resource (e.g., acollection of information).

The various operations of example methods described herein may beperformed, at least partially, by one or more processors 1302 that aretemporarily configured (e.g., by software) or permanently configured toperform the relevant operations. Whether temporarily or permanentlyconfigured, such processors 1302 may constitute processor-implementedmodules that operate to perform one or more operations or functionsdescribed herein. As used herein, “processor-implemented module” refersto a hardware module implemented using one or more processors 1302.

Similarly, the methods described herein may be at least partiallyprocessor-implemented, a processor 1302 being an example of hardware.For example, at least some of the operations of a method may beperformed by one or more processors 1302 or processor-implementedmodules. As used herein, “processor-implemented module” refers to ahardware module in which the hardware includes one or more processors1302. Moreover, the one or more processors 1302 may also operate tosupport performance of the relevant operations in a “cloud computing”environment or as a “software as a service” (SaaS). For example, atleast some of the operations may be performed by a group of computers(as examples of machines 1300 including processors 1302), with theseoperations being accessible via a network 1326 (e.g., the Internet) andvia one or more appropriate interfaces (e.g., an API).

The performance of certain operations may be distributed among the oneor more processors 1302, not only residing within a single machine 1300,but deployed across a number of machines 1300. In some exampleembodiments, the one or more processors 1302 or processor-implementedmodules may be located in a single geographic location (e.g., within ahome environment, an office environment, or a server farm). In otherexample embodiments, the one or more processors 1302 orprocessor-implemented modules may be distributed across a number ofgeographic locations.

Unless specifically stated otherwise, discussions herein using wordssuch as “processing,” “computing,” “calculating,” “determining,”“presenting,” “displaying,” or the like may refer to actions orprocesses of a machine 1300 (e.g., a computer) that manipulates ortransforms data represented as physical (e.g., electronic, magnetic, oroptical) quantities within one or more memories (e.g., volatile memory,non-volatile memory, or any suitable combination thereof), registers, orother machine components that receive, store, transmit, or displayinformation. Furthermore, unless specifically stated otherwise, theterms “a” or “an” are herein used, as is common in patent documents, toinclude one or more than one instance. Finally, as used herein, theconjunction “or” refers to a non-exclusive “or,” unless specificallystated otherwise.

The present disclosure is illustrative and not limiting. Furthermodifications will be apparent to one skilled in the art in light ofthis disclosure and are intended to fall within the scope of theappended claims.

What is claimed is:
 1. A method for generating a natural language model, the method comprising: receiving more than one annotation of a document; calculating a level of agreement among the received annotations; determining that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement; determining an aggregated annotation representing an aggregation of information in the received annotations and training a natural language model using the aggregated annotation, when the first criterion is satisfied; generating at least one human readable prompt configured to receive additional annotations of the document, when the second criterion is satisfied; and discarding the received annotations from use in training the natural language model, when the third criterion is satisfied.
 2. The method of claim 1, wherein the second criterion is satisfied when the number of annotations received is less than a minimum number.
 3. The method of claim 1, wherein the annotations of the document comprise selection of one or more portions of the document relevant to one or more topics.
 4. The method of claim 1, wherein the annotations of the document comprise selection of one or more categories among a plurality of categories.
 5. The method of claim 4, wherein the level of agreement is determined for each category based on a percentage of annotations that select said category.
 6. The method of claim 5, wherein: the first criterion is satisfied when the number of annotations received is at least a minimum number and the level of agreement for a category is at least a threshold level; and the aggregated annotation is determined as selecting or not selecting said category.
 7. The method of claim 5, wherein the second criterion is satisfied when the number of annotations received is less than a maximum number and the level of agreement is less than a threshold level.
 8. The method of claim 5, wherein the third criterion is satisfied when the number of annotations received is at least a maximum number and the level of agreement is less than a threshold level.
 9. The method of claim 4, wherein a numerical value is assigned to each of the plurality of categories.
 10. The method of claim 9, wherein: the level of agreement comprises a difference between the highest numerical value and the lowest numerical value among the selected categories; the first criterion is satisfied when the difference is no more than a threshold value; and the third criterion is satisfied when the difference is more than the threshold value.
 11. The method of claim 10, wherein the aggregated annotation is determined as selection of a category with the numerical value closest to a mean of the numerical values of all received annotations.
 12. The method of claim 10, wherein the aggregated annotation is determined as selection of a category with the numerical value closest to a median of the numerical values of all received annotations.
 13. The method of claim 1, wherein determining that the criterion among the first criterion, the second criterion, and the third criterion is satisfied is further based on a result of an analysis of the document by one or more pre-existing natural language models.
 14. The method of claim 1, wherein determining that the criterion among the first criterion, the second criterion, and the third criterion is satisfied is further based on known performance levels of annotators.
 15. The method of claim 1, wherein at least one of the annotations received comprises prediction by a pre-existing natural language model.
 16. An apparatus for generating a natural language model, the apparatus comprising one or more processors configured to: receive more than one annotation of a document; calculate a level of agreement among the received annotations; determine that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement; determine an aggregated annotation representing an aggregation of information in the received annotations and train a natural language model using the aggregated annotation, when the first criterion is satisfied; generate at least one human readable prompt configured to receive additional annotations of the document, when a second criterion is satisfied; and discard the received annotations from use in training the natural language model, when the third criterion is satisfied.
 17. The apparatus of claim 16, wherein the annotations of the document comprise selection of one or more categories among a plurality of categories.
 18. The apparatus of claim 17, wherein the level of agreement is determined for each category based on a percentage of annotations that select said category.
 19. The apparatus of claim 17, wherein a numerical value is assigned to each of the plurality of categories; the level of agreement comprises a difference between the highest numerical value and the lowest numerical value among the selected categories; the first criterion is satisfied when the difference is no more than a threshold value; and the third criterion is satisfied when the difference is more than a threshold value.
 20. A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: receive more than one annotation of a document; calculate a level of agreement among the received annotations; determine that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement; determine an aggregated annotation representing an aggregation of information in the received annotations and train a natural language model using the aggregated annotation, when the first criterion is satisfied; generate at least one human readable prompt configured to receive additional annotations of the document, when a second criterion is satisfied; and discard the received annotations from use in training the natural language model, when the third criterion is satisfied. 